26 research outputs found

    A computational framework for unsupervised analysis of everyday human activities

    Get PDF
    In order to make computers proactive and assistive, we must enable them to perceive, learn, and predict what is happening in their surroundings. This presents us with the challenge of formalizing computational models of everyday human activities. For a majority of environments, the structure of the in situ activities is generally not known a priori. This thesis therefore investigates knowledge representations and manipulation techniques that can facilitate learning of such everyday human activities in a minimally supervised manner. A key step towards this end is finding appropriate representations for human activities. We posit that if we chose to describe activities as finite sequences of an appropriate set of events, then the global structure of these activities can be uniquely encoded using their local event sub-sequences. With this perspective at hand, we particularly investigate representations that characterize activities in terms of their fixed and variable length event subsequences. We comparatively analyze these representations in terms of their representational scope, feature cardinality and noise sensitivity. Exploiting such representations, we propose a computational framework to discover the various activity-classes taking place in an environment. We model these activity-classes as maximally similar activity-cliques in a completely connected graph of activities, and describe how to discover them efficiently. Moreover, we propose methods for finding concise characterizations of these discovered activity-classes, both from a holistic as well as a by-parts perspective. Using such characterizations, we present an incremental method to classify a new activity instance to one of the discovered activity-classes, and to automatically detect if it is anomalous with respect to the general characteristics of its membership class. Our results show the efficacy of our framework in a variety of everyday environments.Ph.D.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: David Hogg; Committee Member: Irfan Essa; Committee Member: James Reh

    Compact Random Feature Maps

    Full text link
    Kernel approximation using randomized feature maps has recently gained a lot of interest. In this work, we identify that previous approaches for polynomial kernel approximation create maps that are rank deficient, and therefore do not utilize the capacity of the projected feature space effectively. To address this challenge, we propose compact random feature maps (CRAFTMaps) to approximate polynomial kernels more concisely and accurately. We prove the error bounds of CRAFTMaps demonstrating their superior kernel reconstruction performance compared to the previous approximation schemes. We show how structured random matrices can be used to efficiently generate CRAFTMaps, and present a single-pass algorithm using CRAFTMaps to learn non-linear multi-class classifiers. We present experiments on multiple standard data-sets with performance competitive with state-of-the-art results.Comment: 9 page

    Audio-Visual Flow - A Variational Approach to Multi-Modal Flow Estimation

    Get PDF
    © 2004 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Presented at the 2004 IEEE International Conference on Image Processing (ICIP 2004), 24-27 October 2004, Singapore.DOI: 101109/ICIP.2004.1421626Just as a motion field is associated to a moving object, an audio field can he associated to an object that can behave as a sound source. The flow field of such a sound source which moves over time would not only have an optical component, but also an audio component; something we call audio-visual How. In this paper we present a common structure tensor based variational framework for dense audiovisual flow-field estimation. The proposed scheme improves the rank of the local structure tensor by incorporating an BUdio information channel which is substantially un-correlated from the complementing visual information channel. The scheme allows ascribing weights to individual sensor modalities based on the confidence in their corresponding measurements. Uesults arc presented to demonstrate how combining multiple modalities in our proposed framework can provide a possible solution to temporary full visual occlusions

    LEMaRT: Label-Efficient Masked Region Transform for Image Harmonization

    Full text link
    We present a simple yet effective self-supervised pre-training method for image harmonization which can leverage large-scale unannotated image datasets. To achieve this goal, we first generate pre-training data online with our Label-Efficient Masked Region Transform (LEMaRT) pipeline. Given an image, LEMaRT generates a foreground mask and then applies a set of transformations to perturb various visual attributes, e.g., defocus blur, contrast, saturation, of the region specified by the generated mask. We then pre-train image harmonization models by recovering the original image from the perturbed image. Secondly, we introduce an image harmonization model, namely SwinIH, by retrofitting the Swin Transformer [27] with a combination of local and global self-attention mechanisms. Pre-training SwinIH with LEMaRT results in a new state of the art for image harmonization, while being label-efficient, i.e., consuming less annotated data for fine-tuning than existing methods. Notably, on iHarmony4 dataset [8], SwinIH outperforms the state of the art, i.e., SCS-Co [16] by a margin of 0.4 dB when it is fine-tuned on only 50% of the training data, and by 1.0 dB when it is trained on the full training dataset.Comment: Accepted by CVPR'23, 19 page

    A Visualization Framework for Team Sports Captured using Multiple Static Cameras

    Get PDF
    This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.DOI: http://dx.doi.org/ 10.1016/j.cviu.2013.09.006We present a novel approach for robust localization of multiple people observed using a set of static cameras. We use this location information to generate a visualization of the virtual offside line in soccer games. To compute the position of the offside line, we need to localize players' positions, and identify their team roles. We solve the problem of fusing corresponding players' positional information by finding minimum weight K-length cycles in a complete K-partite graph. Each partite of the graph corresponds to one of the K cameras, whereas each node of a partite encodes the position and appearance of a player observed from a particular camera. To find the minimum weight cycles in this graph, we use a dynamic programming based approach that varies over a continuum from maximally to minimally greedy in terms of the number of graph-paths explored at each iteration. We present proofs for the efficiency and performance bounds of our algorithms. Finally, we demonstrate the robustness of our framework by testing it on 82,000 frames of soccer footage captured over eight different illumination conditions, play types, and team attire. Our framework runs in near-real time, and processes video from 3 full HD cameras in about 0.4 seconds for each set of corresponding 3 frames

    Unsupervised Activity Discovery and Characterization for Sensor-Rich Environments

    No full text
    This thesis presents an unsupervised method for discovering and analyzing the different kinds of activities in an active environment. Drawing from natural language processing, a novel representation of activities as bags of event n-grams is introduced, where the global structural information of activities using their local event statistics is analyzed. It is demonstrated how maximal cliques in an undirected edge-weighted graph of activities, can be used in an unsupervised manner, to discover the different activity-classes. Taking on some work done in computer networks and bio-informatics, it is shown how to characterize these discovered activity-classes from a wholestic as well as a by-parts view-point. A definition of anomalous activities is formulated along with a way to detect them based on the difference of an activity instance from each of the discovered activity-classes. Finally, an information theoretic method to explain the detected anomalies in a human-interpretable form is presented. Results over extensive data-sets, collected from multiple active environments are presented, to show the competence and generalizability of the proposed framework.M.S.Committee Chair: Aaron Bobick; Committee Member: Charles Isbell; Committee Member: Irfan Ess
    corecore